ocr: Util,+1(sp a, -(1-QUtl,p ap a/Reward (Sp a) +Ymax Util,(Si+1, b). Figure 7: The RL rule for updating the estimated utility is a number less tban one tbat determines the rate ofchange oftbe estimate, Note that the. second part of the equation iss similar to te equation in Figure 4, except tbere are no expectation signs E anywbere,